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The complete genome sequences of the human coronavirus OC43 (HCoV-OC43) laboratory strain from the 
American Type Culture Collection (ATCC), and a HCoV-OC43 clinical isolate, designated Paris, were ob¬ 
tained. Both genomes are 30,713 nucleotides long, excluding the poly(A) tail, and only dilfer by 6 nucleotides. 
These six mutations are scattered throughout the genome and give rise to only two amino acid substitutions: 
one in the spike protein gene (I958F) and the other in the nucleocapsid protein gene (V81A). Furthermore, the 
two variants were shown to reach the central nervous system (CNS) after intranasal inoculation in BALB/c 
mice, demonstrating neuroinvasive properties. Even though the ATCC strain could penetrate the CNS more 
effectively than the Paris 2001 isolate, these results suggest that intrinsic neuroinvasive properties already 
existed for the HCoV-OC43 ATCC human respiratory isolate from the 1960s before it was propagated in 
newborn mouse brains. It also demonstrates that the molecular structure of HCoV-OC43 is very stable in the 
environment (the two variants were isolated ca. 40 years apart) despite virus shedding and chances of 
persistence in the host. The genomes of the two HCoV-OC43 variants display 71, 53.1, and 51.2% identity with 
those of mouse hepatitis virus A59, severe acute respiratory syndrome human coronavirus Tor2 strain 
(SARS-HCoV Tor2), and human coronavirus 229E (HCoV-229E), respectively. HCoV-OC43 also possesses 
well-conserved motifs with regard to the genome sequence of the SARS-HCoV Tor2, especially in open reading 
frame lb. These results suggest that HCoV-OC43 and SARS-HCoV may share several important functional 
properties and that HCoV-OC43 may be used as a model to study the biology of SARS-HCoV without the need 
for level three biological facilities. 


Human coronaviruses (HCoVs), members of the Coronaviri- 
dae family, are ubiquitous in the environment and are respon¬ 
sible for up to one-third of common colds (41). In the past few 
years, we have provided experimental evidence that this virus 
possesses neurotropic and neuroinvasive properties: it persists 
in neural cell cultures (7, 8) and human brains (9). Of the two 
HCoV serotypes available, HCoV-OC43 was selected for fur¬ 
ther characterization of persistence in the nervous system 
because of a more efficient infection of primary neural cell 
cultures (11), as well as a trend toward association with neu¬ 
rological disease (9). 

Coronaviruses are enveloped viruses that possess a positive- 
strand RNA genome of up to 31 kb, which represents the 
largest known genome among all RNA viruses (35). This ge¬ 
nome comprises several genes encoding several structural and 
nonstructural proteins. Among these proteins, the S protein is 
biologically very important because it could be implicated in 
determination of tropism (3) and its modulation (50). Indeed, 
the S protein could be associated with the capacity of the virus 
to reach the central nervous system (CNS) and possibly trigger 
neurological disorders (9, 22). It could also be responsible for 
conferring the strong degree of host species specificity ob¬ 
served with coronaviruses (28). 
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Only the 3' one-third of the HCoV-OC43 genome has been 
sequenced over the years. Therefore, until now, the complete 
sequence of the open reading frame la (ORFla) and ORFlb, 
known as the replicase gene, was still undetermined. This gene 
is essential for coronavirus survival because it contains several 
motifs, which could be involved in various important viral func¬ 
tions such as transcription, replication, and pathogenesis (66). 
The products encoded by these two ORFs are polyprotein 
precursors, which are processed by two or three different pro- 
teinases encoded by ORFla. These proteinases could include 
two papain-like proteases (PLP1 and PLP2) and a poliovirus 
3C-like protease (3CLpro), which presents the most important 
cleavage activity. The 3CLpro essential function is reflected by 
its capacity to cleave at many sites in the replicase polyproteins 
and to release the key replicative functions, such as the RNA- 
dependent RNA polymerase (RdRp) and the RNA helicase 
(67). 

The HCoV-OC43 strain belongs to the second genetic 
group, just as SARS-HCoV apparently does (51). The latter is 
responsible for the severe acute respiratory syndrome (SARS), 
which is a life-threatening form of pneumonia (46). Since the 
outbreak of SARS in the fall of 2002 (60), a lot of work has 
been done to sequence the entire genome of the virus (34) and 
to understand the mechanisms underlying virus pathogenesis. 
As presented here, the whole genome of HCoV-OC43 has now 
been sequenced and, since this human strain is the most re¬ 
lated to SARS-HCoV, it could be used as a model for the study 
of the SARS-HCoV without the drawbacks of level three bio- 
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logical confinement. Comparisons with the SARS-HCoV nu¬ 
cleotide and amino acid sequences (34) revealed that the two 
viruses share extensive homology in some important motifs 
involved in viral replication and pathogenesis. Indeed, the 
most significant homology between the genomes of the HCoV- 
OC43 strain and the one of the SARS-HCoV Tor2 isolate is 
found in the ORFlb region, which comprises the RdRp and 
helicase motifs (16). The 3CLpro motif of HCoV-OC43 also 
displays an important level of identity with the one of SARS- 
HCoV. This finding is noteworthy since SARS-HCoV 3CLpro 
thus far represents the most promising target for SARS ther¬ 
apy (58). 

We report here the complete genome sequences of the 
HCoV-OC43 strain from the American Type Culture Collec¬ 
tion (ATCC), as well as an HCoV-OC43 respiratory clinical 
isolate, designated HCoV-OC43 Paris. Both genomes are 
30,713 nucleotides (nt) long, share the same genomic organi¬ 
zation, and only differ by 6 nt. Differences found in the genome 
of the HCoV-OC43 Paris isolate, compared to the genome of 
HCoV-OC43 ATCC, give rise to only two amino acid substi¬ 
tutions, which are located in the S (I958F) and the N (V81A) 
protein genes. After intranasal inoculation in BALB/c mice, 
the HCoV-OC43 ATCC strain, as well as the Paris isolate, 
reached the CNS, where they replicated and disseminated, 
although mice were apparently more easily infected with the 
ATCC strain than with the Paris isolate. These results suggest 
that both viruses possess the ability to reach and infect neural 
cells in vivo. The fact that a natural OC43 isolate has an 
intrinsic capacity to invade and replicate within the mouse 
CNS also suggests that the HCoV-OC43 ATCC strain has not 
acquired its neuroinvasive properties after propagation in new¬ 
born mouse brains. Bioinformatics analyses were also per¬ 
formed on the HCoV-OC43 genome. These analysis showed 
that this virus strain is closely related to mouse hepatitis virus 
A59 (MHV-A59) and that it displays significant identity levels 
with important functional domains of the SARS-HCoV. These 
data provide evidence that HCoV-OC43 could be used as a 
model for the study of other group 2 coronaviruses, including 
SARS-HCoV, and that it will facilitate understanding of the 
biology of this emerging viral strain. 

MATERIALS AND METHODS 

Viruses and cell lines. The ATCC HCoV-OC43 strain (ATCC number VR- 
759), isolated in the 1960s, and the Paris clinical respiratory isolate, isolated in 
March 2001, were grown on a HRT-18 cell line (human adenocarcinoma rectal) 
as described previously (37). The clinical sample (HCoV-OC43 Paris) was iso¬ 
lated from the respiratory tract of a 68-year-old immunocompromised male who 
was not related whatsoever to laboratory work and was not in contact with any 
laboratory workers who had manipulated the HCoV-OC43 ATCC virus. A re¬ 
verse transcription-PCR (RT-PCR) was performed to specifically detect the 
presence of the HCoV-OC43 RNA, and an aliquot of the clinical sample was 
then used to infect the HRT-18 cell line. The HCoV-OC43 ATCC strain and the 
Paris isolate were never cultured at the same time, and stringent laboratory 
precautions were used in order to eliminate possible cross-contamination. 

Acute infections of cells. Cells were infected at a multiplicity of infection of 
0.02 and 0.2 for the ATCC strain and Paris isolate, respectively. The fifth passage 
of the ATCC strain and the eighth passage of the Paris isolate were used to 
perform the infections. Cell lines at 70% confluence were infected with the 
appropriate virus stock in the presence of TPCK (tolylsulfonyl phenylalanyl 
chloromethyl ketone)-treated trypsin (10 U/ml; Sigma-Aldrich Canada, Ltd.) 
and 1% (vol/vol) heat-inactivated fetal calf serum and then incubated at 33°C for 
4 days in a 5% (vol/vol) C0 2 humid atmosphere. 


Mice and inoculations. In order to determine the susceptibility of mice to an 
infection by HCoV-OC43 ATCC and HCoV-OC43 Paris variants, MHV-sero- 
negative 14-day-postnatal BALB/c mice (Charles River Laboratories, St-Con- 
stant, Quebec, Canada) were inoculated intranasally with 5 p.1 of a virus stock 
solution containing 10 6 50% tissue culture infective dose(s) (TCID 50 )/ml. Five 
mice, inoculated with HCoV-OC43 ATCC or HCoV-OC43 Paris variants, were 
sacrificed every 2 days postinfection (dpi) and processed for detection of infec¬ 
tious virus particles. Every 2 days, two mice infected by HCoV-OC43 ATCC were 
processed for immunohistochemical detection of viral antigens. 

Immunohistochemistry. Mice were perfused by intraventricular injection of 
4% (vol/vol) paraformaldehyde, under deep ketamine-xylazine anesthesia, as 
previously described (22). Brains were dissected and sectioned at a thickness of 
40 (Jim with a Lancer Vibratome. Sections were collected in 0.05 M Tris-buffered 
saline and then incubated for 2 h at 37°C in a 1/1,000 dilution of an ascites fluid 
from mouse MAb 1-10C.3, directed against the spike protein of HCoV-OC43 
(7). Sections were then rinsed and processed with a Vectastain ABC kit (Vector 
Laboratories, Burlingame, Calif.). Labeling was revealed with 0.03% (wt/vol) 
DAB solution (Sigma) and 0.01% (vol/vol) H 2 0 2 , which yielded a dark brown 
product. 

Infectious virus assays. Brain and lung were dissected, homogenized in 10% 
(wt/vol) sterile phosphate-buffered saline (PBS), and centrifuged at 4°C for 20 
min at 1,000 X g, and then supernatants were immediately frozen at — 80°C and 
stored until assayed. The extracts were processed for the presence and quanti¬ 
fication of infectious virus by an indirect immunoperoxidase assay, as previously 
described (22). Briefly, HCoV-OC43-susceptible HRT-18 cells were inoculated 
with serial logarithmic dilutions of each tissue sample. After 4 days of incubation 
at 33°C in a 5% (vol/vol) C0 2 humid atmosphere, the cells were washed in PBS 
and fixed with 0.3% (vol/vol) hydrogen peroxide (H 2 0 2 ) in methanol. After being 
washed with PBS, they were incubated for 2 h at 37°C in a 1/1,000 dilution of an 
ascites fluid from mouse MAb 1-10C.3. Afterward, cells were washed in PBS, and 
horseradish peroxidase-goat anti-mouse immunoglobulins (Dako; Diagnostics 
Canada, Inc., Mississauga, Ontario, Canada) were added, followed by incubation 
for 2 h at 37°C. Antibody complexes were detected by incubation in DAB 
(Sigma) with 0.01% (vol/vol) H 2 0 2 . 

RNA extraction, RT, and PCR. After infection, the cells were washed with 
PBS, and the total RNA was extracted from the cells by using the GenElute 
Mammalian Total RNA miniprep kit (Sigma-Aldrich) as recommended by the 
manufacturer. The RNA was then quantified, and 3 |xg was directly used for RT 
with Moloney murine leukemia virus reverse transcriptase (Invitrogen). For each 
RT, 500 ng of oligo(dT) primer and 0.5 mM deoxynucleoside triphosphates 
(Amersham Biosciences) were used, and the reactions lasted between 50 and 60 
min at 37°C. Then, 2 p.1 of the RT cDNA was then used to perform the PCR 
amplifications. The Expand High-Fidelity Taq polymerase (Roche) was used to 
amplify the HCoV-OC43 genome in six segments, in combination with primers 
listed in Table 1. All amplifications were performed by using the Cetus DNA 
thermal cycler (Perkin-Elmer/Applied Biosystems), and an appropriate anneal¬ 
ing temperature was used for each specific reaction. Except for the PCR JUB3- 
12, which required a higher annealing temperature of 65°C, all other annealing 
temperature used corresponded to the melting temperature of the primers. For 
each PCR amplification, at least six reactions were performed, pooled together, 
migrated on a 0.8% (wt/vol) agarose gel (SeaKem), and gel extracted by using the 
Qiaex II gel extraction kit (Qiagen) prior to sequencing. 

RACE and cloning. Rapid amplification of cDNA ends (RACE), cloning, and 
sequencing were performed for both 5' and 3' ends of HCoV-OC43 ATCC strain 
and the HCoV-OC43 Paris isolate. Primers from the kit used for the RACE are 
listed in Table 1. An RT reaction of the 5' end was performed by using the 
GeneRacer kit (Invitrogen) as recommended by the manufacturer, whereas RT 
of the 3' end was performed only by using the GeneRacer oligo(dT) primer 
provided in the kit (Table 1). In order to amplify both ends, primers from the kit 
were used in combination with primers specific for the HCoV-OC43 genome. 
Therefore, the GeneRacer 5' nested primer was used with JUB2 primer, and 
GeneRacer 3' nested primer was combined with JUMOl primer for the ATCC 
strain and JU08 primer for the Paris isolate. Amplicons of the 5' ends of both 
viruses and of the 3' end of the Paris isolate were cloned by using the Zero Blunt 
TOPO PCR cloning kit for sequencing (Invitrogen), whereas amplicons of the 3' 
end of the ATCC strain were cloned by using the TOPO XL PCR cloning kit. 
The RACE 5' clones were sequenced by using M13 universal forward and 
reverse primers and RACEJUB1 and RACEJUB2 primers, and RACE 3' clones 
were sequenced by using M13 universal forward and reverse primers and JU07 
primer. 

Sequencing. Sequencing reactions were performed by Bio S&T (Montreal, 
Quebec, Canada) by using the dideoxy method (Sanger) and specific primers, 
which are listed in Table 2. As described above, PCR products were directly 
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TABLE 1. Primers used for amplification of the HCoV-OC43 genome 


Primer combination and 
(nt location) 

Target region or sequence 

Amplicon 

length 

(bp) 

JUB3-JUB12 (1-20 and 6071-6091) 

Leader, 5'UTR, and ORFla 

6,091 

JUB5-JUB6 (5319-5339 and Ulll- 
11131) 

ORFla 

5,813 

JUB7-JUB8 (10901-10921 and 
16525-16545) 

ORFla and ORFlb 

5,645 

JUB9-JUB10 (16309-16329 and 
21544-21564) 

ORFlb and ns2 

5,256 

JUNS01-JUS02 (21330-21350 and 
27754-27774) 

ORFlb, ns2, HE, and S genes 

6,445 

JUMOl-GeneRacer, 3' nested 
(27649-27669 and 30742-30764) 
GeneRacer, oligo(dT) 

GeneRacer, 5' nested 

GeneRacer, 3' nested 

S; nsl2.9; E, M, and N genes; and 3'UTR 

5'-GCTGTCAACGATACGCTACGTAACGGCATGACAGTG(T) 18 -3' 

5' -GGACACTG ACATGG ACTGAAGGAGTA-3' 

5' -CGCTACGTAACGGCATG ACAGTG-3' 

3,116“ 


a This value assumes a poly(A) tail of 28 bp. 


sequenced for both genomes, and both strands were sequenced in each case, 
including RACE clones. For each genome, at least two RACE 5' and 3' clones 
were sequenced for both isolates. Sequences obtained from chromatograms were 
aligned by using the basic local alignment search tool (BLAST; bl2seq) from the 
National Center for Biotechnology Information and were analyzed by using the 
Chromas 2 software. 

Bioinformatics analyses. Bioinformatics analyses were performed by Se¬ 
quence Bioinformatics (Montreal, Quebec, Canada). The BLAST program was 
used to perform genome versus genome and gene versus genome alignments. 
RNA folding was analyzed by using MFOLD. PHYLIP was used for phylogenic 
tree construction. The FASTA-formatted sequences of the complete genomes 
were aligned with CLUSTAL W (vl.82) by using the default parameters for 
DNA alignments. The PHYLIP output option of CLUSTAL W was used to 
produce a multiple alignment file that was used as input for dnaml (v3.6), which 
produced an unrooted maximum-likelihood phylogenic tree with the default 
parameters. ORF analysis was performed by using tools from the EMBOSS 
suite. In the case of SARS-HCoV and HCoV-OC43 ATCC, the extracted ORFs 
were submitted to HMMPFAM, of the HMMER suite, for motif detection 
against the PFAM database. The amino acid sequences of the known expressed 
proteins were also submitted to HMMPFAM and the patmatmotif tool of 
EMBOSS. This tool performs motif scanning against the PROSITE motif data¬ 
base. 

Nucleotide sequence accession number. The GenBank sequence accession 
numbers for the complete genome of the HCoV-OC43 ATCC strain and the 
Paris isolate are, respectively AY585228 and AY585229. 


RESULTS 

Amplification and sequencing of HCoV-OC43 ATCC and 
Paris genomes. The genomes of the HCoV-OC43 ATCC strain 
and of the Paris isolate were amplified in six fragments by 
RT-PCR in order to be sequenced (Fig. 1 and Table 1). The 
PCR products encompassed the entire genome of the viruses 
and overlapped each other to make sure the final sequences 
were complete. Primers used for the amplifications of ORF la 
and ORFlb were created by using the sequence of the bovine 
coronavirus Quebec strain (BCoV Quebec) (63), which dis¬ 
plays 97% identity in this region of the genome and was known 
to share 92% identity with the 3' 9 kb of the HCoV-OC43 
genome (24, 26, 29, 37, 38, 39). Primers used to amplify the 3' 
region were designed on the basis of sequences of HCoV- 
OC43 available in GenBank. The gene-walking approach was 
used to sequence the whole genome of the HCoV-OC43 
ATCC strain, whereas the Paris isolate was sequenced by using 


the primers generated during the sequencing of the ATCC 
strain (Table 2). 

Main features of the HCoV-OC43 genome. The genomes of 
the two variants contain 30,713 nt, excluding the poly(A) tail, 
and include nine main ORFs flanked by 5' (nt 1 to 209)- and 
3' (nt 30426 to 30713)-untranslated regions (UTRs) (Fig. 1 and 
Table 3). The genome of FICoV-OC43 contains multiple sec¬ 
ondary ORFs, scattered throughout the genome in all frames 
and in both orientations (data not shown). By using ShowORF 
software, it was possible to determine that the translation com¬ 
plex could potentially use several of these ORFs in the 5'—>3' 
orientation with, for instance, a translation reinitiation mech¬ 
anism (20, 36). By using better-characterized coronaviruses, 
putative transcription-regulating sequences (TRSs) of HCoV- 
OC43 were also identified (4, 44, 58) (Table 3). These se¬ 
quences are found at the 5' end of each viral RNA, genomic or 
subgenomic, and represent signals for the discontinuous tran¬ 
scription of subgenomic mRNA (49). The identified canonical 
core sequence for FlCoV-OC43 was 5'-UCUAAAC-3'. but it 
was not always perfectly conserved throughout the genome 
(Table 3). 

Using bioinformatics tools and well-characterized coronavi¬ 
ruses (15, 18, 67), it was also possible to draw a precise map of 
the main domains contained within the polyprotein lab and to 
determine the location of the putative viral proteolytic cleav¬ 
age sites (Fig. 2). Most of the main motifs found throughout 
the genome after bioinformatics analysis corresponded to ex¬ 
pected motifs found in other coronaviruses. Indeed, the PLP1 
and PLP2 motifs, the membrane-spanning domains (TM), and 
the 3CLpro motif as well as the RdRp and the RNA helicase 
motifs, were found in ORFlab at the expected positions com¬ 
pared to other coronaviruses. Since HCoV-OC43 and BCoV 
possess a high degree of identity, the positions of the cleavage 
sites were determined with the BCoV model (15), and 14 sites 
were identified in the polyprotein lab. Among these cleavage 
sites, three are recognized by PLP1 or PLP2, and the 11 others 
are recognized by 3CLpro, generating mature products con¬ 
taining key motifs for viral transcription and replication. A 
putative ribosomal — 1 frameshift was also identified upstream 





TABLE 2. Primers used for sequencing of the HCoV-OC43 genome' 


Positive-strand 

primer 


JUB3 

RACEJUB2 

JUB31 

JUB32 

JUB33 

JUB14 

JUB141 

JUB142 

JUB143 

JUB144 

JUB161 

JUB163 

JUB164 

JUB165 

JUB166 

JUB5 

JUB51 

JUB52 

JUB53 

JUB54 

JUB17 

JUB171 

JUB172 

JUB173 

JUB55 

JUB56 

JUB57 

JUB58 

JUB7 

JUB71 

JUB72 

JUB73 

JUB74 

JUB75 

JUB76 

JUB77 

JUB78 

JUB79 

JUB710 

JUB711 

JUB9 

JUB91 

JUB92 

JUB93 

JUB94 

JUB95 

JUB96 

JUB97 

JUB98 

JUB99 

JUB100 

JUNS03 

JUNS06 

JUHEOl 

JUHE03 

JUHE05 

JUHE07 

JUS024 

JUS05 

JUS07 

JUS09 

JUSOll 

JUS013 

JUS015 

JUS017 

JUS019 

JUS021 

JUS025 

JUMOl 

JUM03 

JUM05 

JU08 

JU03 

JU05 

JU07 


Sequence (5'—>3') 


GATTGTGAGCGATTTGCGTGC 

TGTGATGGTGGATTGTCGCCG 

GTTATATATGATTGATCCTGC 

TGATTATACTGGTAGTCTTGC 

GCACAATCTTCAGGTGTTTTG 

AGTTGCTAGGTGTGTCAGATG 

CTTATATAGTAGTGGAGAGTG 

GTTCTGATTTTTCATTAGCGG 

ACAGCTCCTGAAGATGATGAC 

TTAACCTTATGTGATTGGCAG 

ATGCTATGTTCTTTTATGGTG 

TATTATTGGGCATGGTATGTC 

AAGTTATGTATTGTTAGAGCG 

TATTTTGAGTGTACTGGAGGC 

TGCTTGCCTATTACAACATGC 

CCTGCTAGATTTGTATCGTTG 

ACTCAGCGTATTATTAAAGCC 

GTTGACGATGGTGGTGACAGC 

AATCTACACCACAGAAATTGC 

TGGCAGGATTTGATATGTTAG 

GGTTTTACCCATTGTTTGCTC 

GCTTGTTCTATGATCGTGATG 

CTGTGCTCGTAAAAGTTGTTC 

AATAAGCAGATGGCTAATGTC 

AAGGTTTTATCCGTCTTCCAG 

CACTTACAATGGCTAGTTATG 

CCGTCTCAACTTCATTCTTGC 

CTTGTGGATCTGTTGGTTATG 

GGCTTCTACATTTTTGTTTAG 

TGTATTTCACAGATATACCTC 

CAGCAGTTAAAACAGCTAGAG 

ATAATGAGGTATCTGCTACTG 

TGTTAAACCCGATGCTACCAC 

AGAGAGATGAAATGCTATGAG 

TTGGATTGCGAATTGTATGTG 

TTGTATGATTTACGCACTTGC 

CATTATCATTTGAGGAGCAGG 

G AGGCATGTT GTT CGC AAAGC 

ATATAAGTGCCTTTCAACAGG 

CAAAAGTTTACTGATGAGTCC 

TTATTGTGAAGATCATAAGCC 

TCAACACATTGGTATGAAACG 

AGTT C ATT GT GTTTTAG GGTC 

TATTGGTGATTCTGCTGTTAC 

TAGAACTGGTTACTATGGTTG 

TTTTGAGGCACATAAGGACTC 

AGTCAAGACTGGTCATTATAC 

CGTTCTAATAATGGCGTTTAC 

TAGGCTTGTACCGAAGACAGC 

ATACTCAGTTATGTCAATATC 

TCGAGACAAGTTAGCTCTGGG 

TTAATGATATGGTTTATTCCC 

GTGTAGAAGAATTGCATGACG 

AACAATTCTTGGTTCTTCC 

TTTAGGAGTTTTCACTTTACC 

TCATGGAGATGCTGGTTTTAC 

ATTAATAACCCTGATTTACCC 

AATGTATAGTGAGTTCCCTGC 

CTTTCACACTATTATGTCATG 

TATTCAGGCAGACTCATTTAC 

AATTGAATGGTTCGTGTGTAG 

GTGTTTGTGTTAATTATGACC 

TAGGTAGTGGTTACTGTGTGG 

GAATGGTGTTACTCTTAGCAC 

TGGATGTGCTAAGTCAAAATC 

ATTTCTGTGGTAATGGTAATC 

TGGCACCAGATTTGTCACTTG 

GTGATGATTATACTGGATACC 

TAGTTGCCATTTGTTTATTGG 

ATGTGGATTGTGTATTTTGTG 

ATGTCTTTTACTCCTGGTAAGC 

TCTACTGGGTCGCTAGTAACC 

CCCACAGTTCCCCATTCTTGC 

CTCTCTATCAGAATGGATGTC 


Location (nt) 

Negative-strand 

primer 

1-20 

JUB12 

380-400 

JUB11 

662-682 

JUB111 

983-1003 

JUB112 

1488-1508 

JUB113 

1801-1821 

JUB 115 

2254-2274 

JUB15 

2557-2577 

JUB151 

3093-3113 

JUB152 

3621-3641 

JUB153 

3700-3720 

JUB13 

3959-3979 

JUB131 

4310-4330 

JUB132 

4782-4802 

RACE JUB1 

5128-5148 

JUB6 

5319-5339 

JUB61 

5913-5933 

JUB62 

6264-6284 

JUB63 

6701-6721 

JUB64 

7036-7056 

JUB 18 

7177-7197 

JUB181 

7678-7698 

JUB 182 

8072-8092 

JUB 183 

8403-8423 

JUB 184 

9043-9063 

JUB65 

9540-9560 

JUB66 

9925-9945 

JUB67 

10378-10398 

JUB68 

10901-10921 

JUB8 

11443-11463 

JUB81 

12084-12104 

JUB82 

12547-12567 

JUB83 

13073-13093 

JUB84 

13559-13579 

JUB85 

14066-14086 

JUB86 

14465-14485 

JUB87 

14856-14876 

JUB88 

15230-15250 

JUB89 

15636-15656 

JUB810 

16040-16060 

JUB811 

16309-16329 

JUB812 

16903-16923 

JUB 10 

17498-17518 

JUB101 

18034-18054 

JUB102 

18562-18582 

JUB103 

18997-19017 

JUB104 

19498-19518 

JUB105 

19859-19879 

JUB106 

20313-20333 

JUB107 

20736-20756 

JUB108 

21067-21087 

JUB109 

21396-21416 

JUB 110 

21807-21827 

JUS02 

22226-22244 

JUS04 

22588-22608 

JUS06 

22959-22979 

JUSOS 

23337-23357 

JUSO10 

23482-23502 

JUS012 

23993-24013 

JUS014 

24378-24398 

JUS016 

24752-24772 

JUS018 

25123-25143 

JUSO20 

25498-25518 

JUS022 

25867-25887 

JUHE02 

26234-26254 

JUHE04 

26632-26652 

JUHE06 

27016-27036 

JUHEOS 

27382-27402 

JUNS02 

27649-27669 

JUNS04 

28169-28189 

JU02 

28662-28682 

JU04 

29079-29100 

JU06 

29512-29532 

JUM02 

29999-30019 

JUM04 

30487-30507 

JUM06 


Sequence (5'—>3') 


ACATCACCTGTAGCTGTTGGC 

CCAGTAACGTCTGTAACCTTC 

CCATGCCTCTTGCCATTGAAC 

TGAATTTAACACCATCAACAG 

ATCCTCTTGATTATTGCTAAC 

AGGGTTTACAACAACTTCTGC 

AACAGCCATAGAATGACTATC 

GCTTTAGGCACATACAGACCC 

ACCACAGCATAAAATTCCTCC 

CAAATTCTTCTACGTCCAATC 

AATGCTTGACCACTTACTGCC 

CAAGCAAAACCATCTATCATG 

CTCCAAGTAGGAAATAATGCC 

GGAGCAAATCATATCCACCTC 

CCTCTAAATGTCTGCTTGTAC 

ATAGCAGCCAACAGTGTTTCC 

AGGGTCACTGTAAGAACAAGC 

CTATTAAAAGCAACATCAGAC 

AGAAAGTATGGGGTAAACTTG 

ATCTCTACCACAAAAGGTCCC 

TGGTAGGGACATTAAACACTG 

GCTTTCTTCAATTTATGCTGG 

CATAGACAAAAATGTATCCAC 

AAGTATTACCTGGTTTATAAG 

ACCTAACACTCCAATGTAATG 

CTAAAAGTGTTCTTAATCCAC 

TATACTAACAAGAGTCATACC 

ACATCACCTGTAGCTGTTGGC 

TAACTCTGCTTAAAGGCTTCC 

GATGTTTGAGAAGAGCAGACC 

CATAATATCCTGTGACAAAGG 

TATAACTACAGGAACACCACG 

AAAAGTGCTTCAGATCAACTG 

ATACTCCAGTGCTTAAAATAC 

TCCTTGCGTACAATGTGTGGC 

TCCAC AAACTTG AC AAACAT C 

TCTTGCTAGTGTGTTACAACC 

CTTGGATAGTTTGAATCTGCC 

ACTAGAACACGCCTCATCAAG 

CTTAAAATTAAGCATAAGGGC 

AGAACAAATCATGGTTAATGC 

AAATGGGTAAGTGGAAAATTG 

AGAGCTAACTTGTCTCGAATC 

GCAAACACTCTTACTACCACC 

GCGGTGGACCTCTTATCATCG 

GTATTGTAAGACTCTAAGTAC 

TACAAAAGAGTCTTAACAGAC 

CCCATGTAACTAGCACAACAC 

TTATATTTGTCATCTACTGCC 

TTATGCCACAAAGGGTTAGCC 

GGTAGTAAACACATACTTACG 

CTGCGGAACAAGCGTAGGAGC 

AAGAACTTTAACAAATGCTAG 

ATCAAGTGACAAATCTGGTGC 

ATATGATTACCATTACCACAG 

TGAATAGCATAAAGGGCATTG 

GCTGCCTAGACAACCTAATAC 

AATGGCTCAAAATTAGTAAAC 

TTATAATAAGTCGCATTAACC 

ACAAGTTAAATAATTAGTACC 

ATAGTTATGCTGGAAAAACAC 

TAGAAGTGAGAGGTGTAACCC 

TATAGGATGTATTTACAAAAG 

ACATAATAAGTACCCAAACC 

CATTATCATACCTAAAAACGC 

AGACCATAAATAACACCAGTG 

ATGATAAGGCGTAAAATTAAC 

GAAACAACATTGGTAGGAGGG 

TTCCTTAATGGACAGTGCTGC 

GCAGCAAGACATCCATTCTG 

TACCAAAACACTGCTGAACAG 

ATACCATCGTGGCAGCAGTTG 

CCACTTGAGGATGCCATTACC 

TACACAATCCACATAATAATG 

TATAAAAATTATTTGCCCCAC 


Location (nt) 


6071-6091 

57505770 

5277-5297 

4918-4938 

4458-4478 

4086-4106 

3876-3896 

3395-3415 

2915-2935 

2399-2419 

1991-2011 

1427-1447 

1049-1069 

312-332 

11111-11131 

10724-10744 

10205-10225 

9752-9772 

9453-9473 

9213-9233 

8761-8781 

8345-8365 

7950-7970 

7543-7563 

7252-7272 

6926-6946 

6531-6551 

6071-6091 

16525-16545 

16117-16137 

15522-15542 

15050-15070 

14604-14624 

14158-14178 

13648-13668 

13245-13265 

12843-12863 

12448-12468 

12048-12068 

11649-11669 

11245-11265 

21544-21564 

21064-21084 

20414-20434 

19908-19928 

19369-19389 

18973-18993 

18444-18464 

17983-18003 

17599-17619 

17153-17173 

16808-16828 

27754-27774 

27384-27404 

27020-27040 

26678-26698 

26307-26327 

25937-25957 

25553-25573 

25188-25208 

24809-24829 

24436-24456 

24031-24051 

23778-23797 

23408-23428 

23037-23057 

22660-22680 

22416-22436 

21917-21937 

30495-30514 

29920-29940 

29430-29450 

29136-29156 

28655-28675 

28150-28170 


" Underlined nucleotides indicate mismatched bases with regard to the genome sequence of the HCoV-OC43 ATCC strain. 


8827 







8828 


ST-JEAN ET AL. 


J. Virol. 


ns2 ns12.9 


ORF la 


ORF 1b 


5’ 


i 


i 


HE S 


1 


E M N 


1 


3’ 


UTR 5’ 


replicase 


I 


UTR 3’ 


1-6091 


10901-16545 


21330-27774 



5319-11131 16309-21564 27649-30764 

FIG. 1. Schematic representation of the HCoV-OC43 genome and of the amplification strategy used for sequencing. The HCoV-OC43 genome 
is 30,713 nt long and comprises nine main ORFs: ORFla, ORFlb, ns2 (the gene encoding the nonstructural protein 2), HE (hemagglutinin- 
esterase gene), S (spike gene), nsl2.9 (the gene encoding a nonstructural protein of 12.9 kDa), E (small envelope gene), M (membrane gene), and 
N (nucleocapsid gene). The replicase gene includes both ORFla and ORFlb. The entire genome was amplified in six fragments in order to be 
sequenced. Each PCR product was named according to the name of the primers used for the amplification, and the location in the genome is 
indicated above or below each PCR product. Boxes: open, gene encoding the replicase polyprotein; dotted, genes encoding nonstructural proteins; 
shaded, genes encoding structural proteins; black, UTRs. GR, GeneRacer. 


of the intersection of ORFla and ORFlb. The slippery se¬ 
quence 13 33 4 UUUAAAC 13 34 0 was found at the 3' end of 
ORFla and is thought to be involved, in combination with 
RNA pseudoknot structures, in the frameshift, which would 
occur at the C 13 3 40 nt (58). 

High degree of identity between the HCoV-OC43 ATCC 
strain and the Paris isolate. Differences in nucleotides and 
amino acids between the HCoV-OC43 ATCC strain and the 


Paris isolate are presented in Table 4. In all, only 6 nt differ 
between the two variants. These mutations are located in the 
5'UTR, in the ns2, S, M, and N genes, and in the 3'UTR. 
According to the MFOLD software, mutations located in the 
UTRs would not affect RNA folding (data not shown) and 
would therefore not have any effect on viral transcription and 
replication. Mutations located in the ns2 and M genes would 
not affect virus biology since they do not give rise to any amino 


TABLE 3. Organization of the HCoV-OC43 genome 


Genome region 

Location (nt) 

TRS location (nt) 

TRS sequence 0 

Leader and 5'UTR 

1-209 

63-69 

UCUAAAC.. .139 nt. . .AUG 

ORFla 

210-13361 



ORFlb* 

13361-21496 



Intergenic region 

21497-21505 



ns2 gene 

21506-22342 

21492-21498 

UCUAAACUUUAAAAAUG 

Intergenic region 

22343-22353 



HE gene 

22354-23628 

22339-22344 

UUAAACUCAGUGAAAAUG 

Intergenic region 

23629-23642 



S gene 

23643-27704 

23636-23642 

UCUAAACAUG 

Intergenic region 

27705-27791 



nsl2.9 gene 

27792-28121 

27771-27777 

UCUUAAGGCCACGCCCUAUUAAUG 

E gene c 

28108-28362 



Intergenic region 

28363-28376 



M gene 

28377-29069 

28367-28373 

UCCAAACAUUAUG 

Intergenic region 

29070-29078 



N gene 

29079-30425 

29065-29070 

UCUAAAUUUUAAGGAUG 

3'UTR 

30426-30713 



Poly(A) tail of 28 nt 

30714-30741 




a Nucleotides in boldface indicate TRS sequences, whereas underlined nucleotides indicate the initiation codon. 
b Putative ribosomal —1 frameshift between ORFla and ORFlb. 
c ORF overlap for the nsl2.9 gene and the E gene. 
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FIG. 2. Schematic representation of the polyprotein lab putative proteolytic processing and of the main domains found in ORFlab. The 
approximate positions of predicted functional domains and protease cleavage sites in ORFlab are shown, and amino acids positions are also 
indicated. The white arrows indicate putative cleavage sites recognized either by the PLP1 or the PLP2, whereas black arrows indicate sites 
recognized by the main protease, 3CLpro. The 15 putative cleavage products generated by the proteolytic processing are named as follows: leader 
protein, MHV p65-like protein, nspl (PL1, X, PL2, and Tl), T2, nsp2 (3CLpro), nsp3 (T3), nsp4, nsp5, nsp6, nsp7, nsp9 (RdRp), nsplO (HEL), 
nspll, nspl2, and nspl3 (15). A putative ribosomal -1 frameshift is indicated between ORFla and ORFlb. Upstream of the frameshift site, the 
slippery sequence 13334 UUUAAACJ 334 Q is found. PL1 and PL2, accessory protease domains; X, conserved domain of unknown function; Tl, T2, 
and T3, membrane-spanning (hydrophobic) domains; 3C, 3CLpro domain; Z, putative zinc finger; HEL, NTPase RNA helicase domain; ND, 
domain conserved exclusively in nidoviruses. nsp, Nonstructural protein. 


acid substitution. However, two mutations lead to amino acid 
substitutions. The first is located at nt 26514, in the S2 subunit 
of the S gene, and gives rise to the I958F (ATCC^Paris) 
mutation, whereas the second is located at nt 29320, in the N 
gene, and gives rise to the V81A mutation. 

Neuroinvasion in BALB/c mice. After inhalation of virus, 
mice were processed for histochemical labeling of HCoV- 
OC43 ATCC antigens. Cells positive for viral antigens were 
first observed ca. 3 dpi in the olfactory bulb as patches of 
labeled neurons (Fig. 3A). No cells positive for viral antigens 
could be seen in other part of the brain, even near perivascular 
blood cells. At 7 dpi, viral antigens were detected in all brain 
regions, indicating a rapid dissemination throughout the CNS 
(Fig. 3B). Five mice of each group, infected with HCoV-OC43 
ATCC and Paris, were sacrificed every 48 h, and virus titers 
were measured in the CNS and lung. Even though mice in¬ 
haled a viral suspension, virus was rarely found in the lung 
(limit of detection was 10 1 ' 5 TCID 50 /g due to a lung extract 
toxicity on HRT-18 cells) and only when brain titers reached at 
least 10 4 TCID 50 /g (data not shown). HCoV-OC43 ATCC 
infectious virus could be detected in mouse CNS, as early as 2 


TABLE 4. Sequence differences between the reference strain 
HCoV-OC43 ATCC and the Paris isolate 


Mutation 
location (nt) 

Region of 
mutation 

Consequence of mutation 
(ATCC^Paris) 

31 

5'UTR 

C->T 

22243 

ns2 gene 

TTC (Phe246)^TTT (Phe246) 
(no amino acid change) 

26514 

S gene 

ATC (Ile958)—>TTC (Phe958) 

28808 

M gene 

ACT (Thrl44)^ACC (Thrl44) 
(no amino acid change) 

29320 

N gene 

GTA (ValSl)^GCA (Ala81) 

30632 

3'UTR 

C^A 


dpi. Virus titers were maximal ca. 4 dpi and remained high 
throughout the experiments (Fig. 3C). When virus reached the 
brain, replication of HCoV-OC43 ATCC led to a fatal enceph¬ 
alitis. Infectious HCoV-OC43 Paris could only be detected in 
mice starting at 6 dpi (Fig. 3C), and a lower number of mice 
were productively infected by HCoV-OC43 Paris than by 
HCoV-OC43 ATCC. Nevertheless, when infectious virus 
reached the brain, infectious virus titers were comparable for 
the two HCoV-OC43 variants, suggesting that the ATCC and 
Paris variants both exhibit neuroinvasive and neurotropic 
properties. 

Comparison of HCoV-OC43 with other coronaviruses. HCoV- 
OC43 is part of the second genetic group of coronaviruses (30) 
and displays higher identity levels with virus strains that belong 
to this group, including SARS-HCoV. The coronavirus strains 
that present the highest degree of identity with HCoV-OC43 
are BCoV and MHV-A59, with 95 and 71% identities, respec¬ 
tively. HCoV-OC43 and BCoV are very related at the nucle¬ 
otide level, and most of the differences between the two ge¬ 
nomes are found in the SI subunit of the S gene, suggesting 
that the two virus strains possess similar biological properties 
but display a different cellular tropism. SARS-HCoV and 
HCoV-229E display 53.1 and 51.2% identity with regard to the 
HCoV-OC43 strain. Although SARS-HCoV is apparently part 
of group 2 (51), the overall identity level with the OC43 strain 
is not striking, but the two strains present a very high degree of 
amino acid identity in some important functional domains, 
such as the RdRp, the RNA helicase, and 3CLpro. 

A phylogenic unrooted tree regrouping seven coronavirus 
strains from the three genetic groups was obtained by using the 
complete genome sequences of all strains (Fig. 4). This tree is 
the first one that includes the complete genome of the HCoV- 
OC43 strain. It shows that HCoV-OC43 and BCoV are evo¬ 
lutionary very related and that they form a clade with MHV- 
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FIG. 3. Neuroinvasive properties of HCoV-OC43 ATCC and HCoV-OC43 Paris variant in BALB/c mice after intranasal inoculation. (A) At 
3 dpi, cells positive for viral antigens (arrows) were first observed in the olfactory bulb (OB). No infected cells could be detected in the cortex or 
other brain structures, illustrating transneuronal spreading of the virus. (B) At 7 dpi, the virus has disseminated to the entire CNS, as illustrated 
by the presence of immunopositive cells throughout the brain. Magnification (A and B), X32. (C) Quantification of infectious virus in the brain 
of each mouse at different times postinfection. Virus titers are presented as logarithmic value of TCID 50 per gram of tissue (the limit of detection 
was 10°' 5 TCID 50 /g). Infection by HCoV-OC43 ATCC was detected in one mouse as early as 2 dpi, and gradually more mice became positive. 
HCoV-OC43 ATCC infectious particles were found between 2 to 8 dpi in mouse brain and led to fatal encephalitis before the end of the 
experimentation. Antigens of the HCoV-OC43 Paris variant were first revealed in mouse brain at 6 dpi. Infectious particles were detected in some 
of the brains up to 10 dpi. HCoV-OC43 Paris infectious titers in susceptible animals were similar to those found after HCoV-OC43 ATCC 
infection, and mice positive for either variant presented all pathological and clinical signs of encephalitis. 


A59. Although SARS-HCoV is apparently part of group 2 
(51), the analysis shows that it is more divergent from strains of 
the previous clade and that infectious bronchitis virus (IBV) 
and SARS-HCoV display the highest divergence among the 
strains analyzed. Group 1 coronaviruses are also grouped in 
such a clade. 

A BLAST analysis with coronaviruses from all three genetic 
groups showed that different degrees of identity exist between 
several regions of different virus strains but that the most 
conserved region among all coronaviruses is located within 
ORFlb (data not shown). More stringent BLAST analysis was 
carried out on the genome sequences of HCoV-229E, MHV- 
A59, and SARS-HCoV with the HCoV-OC43 genome as a 
reference (data not shown). Among all genes analyzed, the 
most significant identity levels were found in ORFlab, as well 
as in the S2 subunit of the S gene. Significant identity levels 
were observed with MHV-A59 and SARS-HCoV. With re¬ 
gards to the ORFlab, the identity levels were usually lower in 
ORFla than in ORFlb and were more significant in the case 
of MHV-A59 than for SARS-HCoV. Moreover, identity was 
more significant at the amino acid level. Several domains which 
are essential for viral replication, such as the 3CLpro, RdRp, 
and helicase domains, are very interesting because of their 
functional importance. The S2 subunit of the S gene also dis- 



FIG. 4. Phylogenic unrooted tree regrouping seven coronavirus 
complete genomes from the three genetic groups. Circles regroup 
members of each three genetic groups. The 0.1 sliding bar represents 
the genetic distance between the species (i.e., nucleotide substitution 
units per studied site). Strains: MHV-A59 (NC-001846); BCoV, bovine 
coronavirus Quebec strain (AF220295); SARS-HCoV, SARS-HCoV 
Tor2 strain (AY274119); IBV, IBV Beaudette strain (NC-001451); 
TGEV (NC-002306); HCoV-229E (NC-002645). 
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FIG. 5. Multiple alignments of amino acids of the main proteases of coronaviruses from all three genetic groups. Positions with absolute 
conservation are shadowed, whereas residues of the putative catalytic dyad, His 41 and Cys 145 , are boxed. Conservation level among group 2 
coronaviruses was ca. 46.2%, whereas all strains displayed 26% identity. Strains: OC43, HCoV-OC43 (group 2); BCoV, BCoV Quebec group 2; 
MHV-A59, MHV group 2; SARS Tor2, SARS-HCoV Tor2 group 2; 229E, HCoV-229E group 1; IBV, IBV group 3. 


played high identity levels with its counterparts from other 
coronaviruses. For instance, the MHV-A59 S2 subunit dis¬ 
played 76% identity and 88% similarity, whereas the SI sub¬ 
unit presented only 53% identity and 65% similarity. This 
result is logical since it has been shown that the membrane 
fusion function resides within the S2 subunit (54, 62), whereas 
the SI subunit is involved in receptor binding (56) and deter¬ 
mination of tropism (3), which is different from one virus to 
another. 

Since SARS-HCoV is now considered as a serious pathogen 
that has recently emerged and that we believe HCoV-OC43 
could represent an excellent model for the study of this virus, 
it was of interest to analyze some functionally important motifs 
that display significant identity levels with the HCoV-OC43 
genome. The most striking identities between the two strains 
were found mainly in ORFlb, albeit the 3CLpro motif, in 
ORFla, also presented a significant identity level. The cleav¬ 
age product containing the 3CLpro motif displayed 48% iden¬ 
tity and 64% similarity with the corresponding region of 
HCoV-OC43. Of the three viral proteases that play a role in 
the processing of the polyprotein lab, 3CLpro is the main 
protease (67). This domain of the viral genome is essential for 
replication since it cleaves the HCoV-OC43 polyprotein lab at 
11 sites and allows the release of important functional domains 
(Fig. 2) (32). Like other coronavirus 3CLpros, HCoV-OC43 
3CLpro acts via a catalytic dyad, which is composed of a His 41 
and a Cys 145 (6). The HCoV-OC43 3CLpro is 303 amino acids 


long and displays an outstanding conservation among corona¬ 
viruses from the three genetic groups (Fig. 5). Seventy-nine 
residues are strictly conserved among sequences from six dif¬ 
ferent coronaviruses, displaying 26% identity among all 
3CLpro sequences analyzed, whereas group 2 coronaviruses 
display 46.2% identity for the same motif between each other. 

DISCUSSION 

HCoV-OC43 belongs to the second genetic group of coro¬ 
naviruses and represents the HCoV that is most related to 
SARS-HCoV. Here, we present the first report of a complete 
sequence of the HCoV-OC43 genome, including the complete 
sequence of a clinical respiratory isolate of the OC43 serotype. 
The two genomes are 30,713 nt long and only differ by 6 nt, 
including two amino acid substitutions located in the S 
(I958F)- and N (V81A)-protein genes. The genomes of the two 
virus variants display 71, 53.1, and 51.2% identity with the 
genomes of MHV-A59, SARS-HCoV Tor2, and HCoV-229E, 
respectively. Using bioinformatics tools and well-characterized 
coronaviruses, further characterization of the HCoV-OC43 ge¬ 
nome was performed, and these analyses revealed that HCoV- 
OC43 is closely related to BCoV and MHV and that it displays 
significant amino acid identity levels with important functional 
domains of the SARS-HCoV. Like the ATCC strain that was 
isolated in the 1960s, HCoV-OC43 Paris, isolated in 2001, 
exhibited neuroinvasive properties in BALB/c mice. Although 
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mice were more easily infected with the ATCC strain than with 
the Paris isolate, these results suggest that both viruses possess 
the intrinsic ability to infect neural cells and to reach the CNS 
from the periphery. 

Recently, L. Vijgen and coworkers have submitted a com¬ 
plete sequence of the HCoV-OC43 genome to GenBank 
(NC-005147). The virus strain used for this sequencing is de¬ 
scribed as corresponding to the virus strain that was used in our 
laboratory (VR-759). However, comparison of our sequence 
with theirs show that they differed at 33 positions, 29 mutations 
being located in the S gene, including two mutations in the S2 
subunit. Of the four other differences, one is located at the 
beginning of the genome sequence, where a guanine is added 
with respect to our sequence, whereas the other three are 
scattered throughout ORFla. Despite these differences, the 
availability of the complete genome sequence from a clinical 
isolate reinforces the validity of our sequence, since the 
HCoV-OC43 ATCC and Paris sequences only differ by 6 nt. 
Therefore, this observation suggests that the viral strain used 
by Vijgen and collaborators could have been adapted in cell 
culture, given the differences observed in the S gene, which is 
known to be associated with viral adaptation (27). No differ¬ 
ences were noticed among ORFlb sequences between HCoV- 
OC43 ATCC, Paris, and the one from Vijgen and coworkers. 
This observation suggests that this region of the genome needs 
a high rate of conservation in order to remain functional and 
that genes located downstream of the replicase gene are more 
permissive to sequence modifications. 

Using a recent HCoV-OC43 clinical respiratory isolate, we 
showed here that HCoV-OC43 apparently remains genetically 
stable in the environment. Indeed, despite virus shedding and 
chances of persistence in the host, the HCoV-OC43 Paris iso¬ 
late displays differences at only six positions with regard to the 
ATCC strain sequence, despite about 40 years have elapsed 
between the two isolations. Since the viral persistence could be 
associated with molecular adaptation (7, 8), the low rate of 
mutation observed here could be explained by the fact that the 
HCoV-OC43 Paris isolate has never or rarely persisted before. 
However, it is too soon to speculate about such an issue given 
that the exact origin of the virus before its isolation remains 
undetermined. It is also worth noting that viral persistence 
does not necessarily require an adaptation to the environment 
(2) and that, despite the high rate of mutation of the corona- 
virus RdRp (1), 95% of the mutations engendered by RNA 
virus polymerases are deleterious and therefore not conserved 
(42). 

Our observation that inhalation of HCoV-OC43 led to a 
generalized infection of the whole CNS in mice demonstrates 
neuroinvasiveness. This result confirms that HCoVs have neu- 
roinvasive properties in mice, which was first shown in new¬ 
born mice (10, 22) and which is consistent with their detection 
in human brain (9, 12, 40, 53). After inhalation, the first in¬ 
fected cells were detected in the olfactory bulb, illustrating that 
virus directly reached the brain by a transneuronal route, as 
already demonstrated for MHV (10, 31, 47). The HCoV-OC43 
Paris isolate, which was never propagated in mouse brain or 
other neurological tissue, also exhibited neuroinvasive proper¬ 
ties in mice. Replication within the CNS was similar for the two 
variants, but fewer mice were infected by the HCoV-OC43 
Paris isolate than by the ATCC strain. These data suggest that 


only one mutation in the S gene, giving rise to one amino acid 
modification, could partially modulate the neuroinvasiveness 
of one variant over the other. Indeed, a single amino acid 
change has already been demonstrated to influence MHV abil¬ 
ity to spread within the CNS (43, 59). 

Although the degree of sequence conservation between the 
genomes of the HCoV-OC43 ATCC and Paris variants is very 
high, their phenotypes seem to differ slightly in mice, since the 
ATCC strain reached the CNS more easily. As we have dem¬ 
onstrated in vitro with primary hippocampus and cortical cell 
cultures, both HCoV-OC43 ATCC and Paris variants were 
able to replicate in rodent neurons, although the HCoV-OC43 
ATCC strain yielded more infectious virus particles than the 
HCoV-OC43 Paris isolate. However, the two viral variants 
exhibited different biological properties, such as plaque forma¬ 
tion and cytopathic effects on different cell lines (H. Jacomy 
and P. J. Talbot, unpublished data). 

Although both mutations preserve some but not all proper¬ 
ties of the parental residues, the I958F mutation leads to a 
substitute phenylalanine that does not display the same steric 
hindrance than the isoleucine, which could potentially affect 
protein folding and function. Moreover, the I958F mutation is 
located in the S2 subunit of the S gene and would probably be 
positioned in the putative fusion peptide domain (23), confer¬ 
ring a lot of impact to this mutation at the biological level. On 
its own, this mutation could therefore have the capacity to 
influence the phenotype of the HCoV-OC43 Paris isolate be¬ 
cause it may interfere with the fusion process in a positive or a 
negative manner (43). Given the known involvement of the S 
protein in viral biology and pathogenesis (7, 8, 15, 48), this 
mutation is more likely to influence the phenotype of the Paris 
isolate. It has been reported that the N protein may be in¬ 
volved in viral RNA synthesis (30) and that it could colocalize 
with nucleolar antigens and delay the cell cycle (14). However, 
the fact that the V81A mutation within the N gene is posi¬ 
tioned in domain I of the protein should not influence the 
RNA binding properties of N, since this functional feature of 
the protein lies in domain II (45). Therefore, even though the 
role of both mutations needs to be investigated, we feel that 
the S mutation is more likely to influence the virus phenotype. 

Comparison with better-characterized coronaviruses (23, 
59) suggests that the I958F mutation is located in the putative 
S fusion peptide and could therefore affect viral fusogenic 
properties and phenotype. Although no fusion peptides have 
formally been identified in any coronavirus S protein, predic¬ 
tions have located such fusion sequences near the N terminus 
of the heptad repeat 1 (HR1) for MHV (33). Studies with the 
MHV-A59 S protein also showed that mutations introduced in 
the HR1 region severely affected cell-cell fusion ability (33). 
Moreover, it has already been reported that a single mutation 
introduced in HR1 could influence the degree of MHV viru¬ 
lence (59). Depending on the effect of the mutation on cleav¬ 
age ability, the phenotype of the resulting virus could also be 
affected. Although the cleavage of the S protein is not abso¬ 
lutely required for fusion (23, 52, 55), it has been shown to 
enhance fusogenicity (55). Thus, inhibition of S-protein cleav¬ 
age would be associated with a more stable interaction be¬ 
tween SI and S2 and would correlate with a loss of fusogenicity 
(25). So, as observed by Tsai et al. (59) for the MHV-JHM 
strain, the 1958F mutation in the S gene of the HCoV-OC43 
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Paris isolate could either alter the conformation of the S pro¬ 
tein or have an incidence on its cleavage, impairing the ability 
of the virus to spread within the CNS. 

An animal model for the HCoV-OC43 ATCC strain has 
recently been developed and optimized in our laboratory (22). 
Moreover, HCoV-OC43 may also be used as a model for the 
study of SARS-HCoV, not only because of the identity level 
the two virus strains display but because HCoV-OC43 can also 
be studied without the requirement of a level three, aerosol- 
aware, biological confinement. Indeed, we have now demon¬ 
strated that the two virus strains present a high level of con¬ 
servation for some essential functional domains, especially 
within 3CLpro, the RdRp, and the RNA helicase. This result is 
consistent with the possible sharing of several important prop¬ 
erties by these two viruses. All of these motifs represent po¬ 
tential candidates for therapy of coronavirus-mediated dis¬ 
eases because they are specific targets and because of the 
specificity they exhibit toward their substrate. Indeed, substrate 
specificities of all coronavirus proteases, and mainly 3CLpro, 
are conserved among the three established groups (19), and 
this is also true for SARS-HCoV. The picornavirus RdRp (21) 
and viral proteases (17) have notably been designated as such 
targets for antiviral therapy. At present, the SARS-HCoV 
3CLpro enzyme represents the most promising target for 
SARS therapy (58). The availability of 3CLpro crystal struc¬ 
tures should provide a valuable tool for rapid identification of 
potential drugs against SARS. Thus far, 3CLpro crystal struc¬ 
tures have been obtained for transmissible gastroenteritis virus 
(TGEV) (5), HCoV-229E (6) and, more recently, for SARS- 
HCoV (61). A putative in vitro inhibitor has also been identi¬ 
fied for TGEV (5) and SARS-HCoV (61). This inhibitor, 
hexapeptidyl chloromethyl ketone, was shown to bind the 
3CLpro enzyme very efficiently in vitro and, although it pro¬ 
vides an excellent structural basis for drug design, in vivo 
experiments need to be performed on this issue. 

Now that the complete genome sequence of HCoV-OC43 
has been deciphered, it will provide a very useful tool for the 
study of coronaviruses from all genetic groups and particularly 
for those of group 2, including SARS-HCoV. Indeed, the ge¬ 
nome sequence will allow comparative studies with other coro¬ 
navirus strains and RNA viruses and will also allow optimiza¬ 
tion of prediction models. This sequence will also allow the 
assembly of an infectious cDNA clone of HCoV-OC43, which 
is currently under way. Thus far, cDNA clones have been 
assembled for several coronavirus strains by using different 
approaches. Among these clones, those of TGEV (3, 64), 
HCoV-229E (57), IBV (13), MHV-A59 (66), and even SARS- 
HCoV (65) are now available. The HCoV-OC43 clone will 
provide an invaluable tool to further understand the underly¬ 
ing mechanisms for replication and pathogenesis of HCoVs. 
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